This environment may only be started from the Cloud Sandbox
Wait, What’s Happening
Deploying your applications to a cluster is just the first step for running containers in production, and it’s important to think about operations and scenarios around your deployments. It is valuable to have a holistic understanding of your cluster when it comes to ensuring your applications are reliable, available, and tolerant to failures.
NOTE: From this challenge on, your team may only advance if your cluster is in a healthy state.
Challenge
Your CTO is impressed with the speed at which you were able deploy the application but now wants to see you how your application is performing. The task the CTO has given for this challenge is to make sure your cluster is ‘production ready’ by implementing a monitoring solution that improves the observability of your cluster and adding alerts for key metrics so you can get ahead of any issues that will occur.
First, choose and implement a monitoring solution for your team to use. While choosing a monitoring solution, think about the four main components that must be considered to fully understand what is happening with your cluster so you can answer critical questions your CTO will ask.
- Applications running on the containers
- Containers
- Underlying Virtual Machines
- Kubernetes API
You can use the simulator from the previous step to create load on your application for testing and monitoring purposes.
A new application
Your focus on security and monitoring has encouraged other teams to try out AKS. Your CTO has asked you to run the following deployment in order to test a new project that calculates insurance rates faster and more accurately than ever before. This new project is intended to eventually become part of the TripInsights application. Ensure that you have configured your monitoring solution so you can quickly identify any issues that might arise.
Replace the image reference below with a reference to your ACR. The insurance application image is already deployed into your ACR.
apiVersion: apps/v1
kind: Deployment
metadata:
name: insurance-deployment
labels:
deploy: insurance
spec:
replicas: 2
selector:
matchLabels:
app: insurance
template:
metadata:
labels:
app: insurance
spec:
containers:
- image: "replaceme.io/insurance:1.0"
imagePullPolicy: Always
name: insurance
ports:
- containerPort: 8081
name: http
protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
name: insurance
spec:
type: ClusterIP
selector:
app: insurance
ports:
- protocol: TCP
name: insurance-http
port: 80
targetPort: 8081
You can verify that the app is running by visiting the service endpoint which should return your calculation.
Of course, you didn’t write this application, so it isn’t up to your standards… Using your recently deployed monitoring solution, monitor the behavior of the new application in your cluster and see if you can determine what the runtime behavior of this application is. Additionally, if you find any issues, make sure to fix them in the deployment and create alerts for anything that might cause your application or cluster to experience downtime.
Success Criteria
- Your team must create a monitoring solution that shows the runtime behaviors of the application. You must be able to answer the following questions:
- How many requests are coming to your cluster?
- How much memory is allocatable per node in your cluster?
- What is the CPU usage of your workload? What is the CPU usage of internal Kubernetes tools?
- How many pods are currently pending?
- Which pod is consuming the most memory?
- Your team must deploy a set of tools that will allow you to monitor your cluster and its applications.
- Your team must demonstrate where to obtain logs for the 4 main components mentioned in the first section of this page
- Your team must successfully implement resource limits on the newly deployed application
- Your team must set up an alert that informs you if an application is nearing resource limits in order to prevent cluster-wide issues
- Your team must demonstrate your cluster is overall “Healthy” for 15 minutess
References
Monitoring Microservices
Azure
- Azure Container Insights reference
- Azure Container Insights Agent Config
- Search Logs to Analyze Data
- Kusto Query Language Reference
- Container Insights Alerts
Prometheus
OpenHack
Hello and welcome to OpenHack, a challenge oriented hack event from Microsoft. You will be presented with a series of challenges, each one more difficult than the one before.
You should already be assigned to and seated with a team, with whom you will attempt to solve as many challenges as you can within today’s hack time.
You have been assigned a coach who will be your first point of contact, and is here to support you and answer questions during the hack. They will not, however, solve the challenges for you.
You may notice a resource group called teamResources in your Azure subscription. This resource group contains any pre-provisioned resources referenced in the challenges.
The Premise
You work for Humongous Insurance. One of their products provides customers the opportunity to qualify for lower car insurance rates. Customers can do this by opting in to use Humongous Insurance’s TripInsights app, which collects data about their driving habits. Your team has been assigned to modernize the application and move it to the cloud.
The TripInsights application, once a monolith, has been refactored into a number of microservices:
- Trip Viewer WebApp (
.NET Core): Your customers use this web application to review their driving scores and trips. The trips are being simulated against the APIs within the OpenHack environment. - Trip API (
Go): The mobile application sends the vehicle’s on-board diagnostics (OBD) trip data to this API to be stored. - Points of Interest API (
.NET Core): This API is used to collect the points of the trip when a hard stop or hard acceleration was detected. - User Profile API (
NodeJS): This API is used by the application to read the user’s profile information. - User API (
Java): This API is used by the application to create and modify the users.
The source code of all the microservices is available here.
The Challenges
Each challenge will lead you through a stage of the technical investigation as briefly laid out by your fictional CTO. These investigations become more technically challenging as you progress.
We do not provide guides or instructions to solve the challenges, just a few hints and documentation references that you may find useful. There are multiple ways to solve each challenge, and very likely some we haven’t thought of. We’re interested in seeing your own unique solutions to each problem, and you should absolutely work with your coaches and the OpenHack Team to validate your solution as correct.
One final tip: Read everything very carefully
The OpenHack team have worked hard to ensure each problem is solvable. All the details you should need are within the challenge briefs, which are very carefully written and worded to give you clues toward the solution. Reading them fully is the best way to figure out a solution, as small points can be easily missed. Your coaches will help to fill gaps in your understanding, provided you ask them the right questions.